28 research outputs found

    Finding Street Gang Members on Twitter

    Full text link
    Most street gang members use Twitter to intimidate others, to present outrageous images and statements to the world, and to share recent illegal activities. Their tweets may thus be useful to law enforcement agencies to discover clues about recent crimes or to anticipate ones that may occur. Finding these posts, however, requires a method to discover gang member Twitter profiles. This is a challenging task since gang members represent a very small population of the 320 million Twitter users. This paper studies the problem of automatically finding gang members on Twitter. It outlines a process to curate one of the largest sets of verifiable gang member profiles that have ever been studied. A review of these profiles establishes differences in the language, images, YouTube links, and emojis gang members use compared to the rest of the Twitter population. Features from this review are used to train a series of supervised classifiers. Our classifier achieves a promising F1 score with a low false positive rate.Comment: 8 pages, 9 figures, 2 tables, Published as a full paper at 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2016

    Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

    Full text link
    Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to learning from a massive amount of data. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition for utilizing knowledge whenever it is available or can be created purposefully. In this paper, we discuss the indispensable role of knowledge for deeper understanding of content where (i) large amounts of training data are unavailable, (ii) the objects to be recognized are complex, (e.g., implicit entities and highly subjective content), and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create relevant and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP techniques. Using diverse examples, we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data and continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI). arXiv admin note: substantial text overlap with arXiv:1610.0770

    A Semantics-Based Measure of Emoji Similarity

    Get PDF
    Emoji have grown to become one of the most important forms of communication on the web. With its widespread use, measuring the similarity of emoji has become an important problem for contemporary text processing since it lies at the heart of sentiment analysis, search, and interface design tasks. This paper presents a comprehensive analysis of the semantic similarity of emoji through embedding models that are learned over machine-readable emoji meanings in the EmojiNet knowledge base. Using emoji descriptions, emoji sense labels and emoji sense definitions, and with different training corpora obtained from Twitter and Google News, we develop and test multiple embedding models to measure emoji similarity. To evaluate our work, we create a new dataset called EmoSim508, which assigns human-annotated semantic similarity scores to a set of 508 carefully selected emoji pairs. After validation with EmoSim508, we present a real-world use-case of our emoji embedding models using a sentiment analysis task and show that our models outperform the previous best-performing emoji embedding model on this task. The EmoSim508 dataset and our emoji embedding models are publicly released with this paper and can be downloaded from http://emojinet.knoesis.org/.Comment: This paper is accepted at Web Intelligence 2017 as a full paper, In 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI). Leipzig, Germany: ACM, 201

    Repurposing emoji for personalised communication::Why [pizza slice] means “I love you”

    Get PDF
    The use of emoji in digital communication can convey a wealth of emotions and concepts that otherwise would take many words to express. Emoji have become a popular form of communication, with researchers claiming emoji represent a type of “ubiquitous language” that can span different languages. In this paper however, we explore how emoji are also used in highly personalised and purposefully secretive ways. We show that emoji are repurposed for something other than their “intended” use between close partners, family members and friends. We present the range of reasons why certain emoji get chosen, including the concept of “emoji affordance” and explore why repurposing occurs. Normally used for speed, some emoji are instead used to convey intimate and personal sentiments that, for many reasons, their users cannot express in words. We discuss how this form of repurposing must be considered in tasks such as emoji-based sentiment analysis

    A Framework to Understand Emoji Meaning: Similarity and Sense Disambiguation of Emoji using EmojiNet

    Get PDF
    Pictographs, commonly referred to as `emoji’, have become a popular way to enhance electronic communications. They are an important component of the language used in social media. With their introduction in the late 1990’s, emoji have been widely used to enhance the sentiment, emotion, and sarcasm expressed in social media messages. They are equally popular across many social media sites including Facebook, Instagram, and Twitter. In 2015, Instagram reported that nearly half of the photo comments posted on Instagram contain emoji, and in the same year, Twitter reported that the `face with tears of joy’ emoji has been tweeted 6.6 billion times. As of 2017, Facebook and Facebook Messenger processed over 60 million and 6 billion messages with emoji per day, respectively. Emogi, an Internet marketing firm, reports that over 92% of all online users have used emoji at least once. Creators of the SwiftKey Keyboard for mobile devices report that they process 6 billion messages per day that contain emoji. Moreover, business organizations have adopted and now accept the use of emoji in professional communication. For example, Appboy, an Internet marketing company, reports that there has been a 777% year-over-year increase and 20% month-over-month increase in emoji usage for marketing campaigns by business organizations in 2016. These statistics leave little doubt that emoji are a significant and important aspect of electronic communication across the world. The ability to automatically process and interpret text fused with emoji will be essential as society embraces emoji as a standard form of online communication. In the same way that natural language is processed with sophisticated machine learning techniques and technologies for many important applications, including text similarity and word sense disambiguation, emoji should also be amenable to such analysis. Yet the pictorial nature of emoji, the fact that the same emoji may be used in different contexts to express different meanings, and that emoji are used in different cultures over the world which can interpret emoji differently, make it especially difficult to apply traditional Natural Language Processing (NLP) techniques to analyze them. Indeed, emoji were developed organically with no overt/explicit semantics assigned to them. This contributed to their flexible usage but also lead to ambiguity. Thus, similar to words, emoji can take on different meanings depending on context and part-of-speech (POS). Polysemy in emoji complicates determination of emoji similarity and emoji sense disambiguation. However, having access to machine-readable sense repositories that are specifically designed to capture emoji meaning can play a vital role in representing, contextually disambiguating, and converting pictorial forms of emoji into text, thereby leveraging and generalizing NLP techniques for processing richer medium of communication. This dissertation presents the creation of EmojiNet, the largest machine-readable emoji sense inventory that links Unicode emoji representations to their English meanings extracted from the Web. EmojiNet consists of (i) 12,904 sense labels over 2,389 emoji, which were extracted from reliable online web sources and linked to machine-readable sense definitions seen in BabelNet; (ii) context words associated with each emoji sense, which are inferred through word embedding models trained over Google News and Twitter message corpora for each emoji sense definition; and (iii) recognizing discrepancies in the presentation of emoji on different platforms and specification of the most likely platform-based emoji sense for a selected set of emoji. It then discusses the application of emoji meanings extracted from EmojiNet to solve novel downstream applications including emoji similarity and emoji sense disambiguation. To address the problem of emoji similarity, first, it presents a comprehensive analysis of the semantic similarity of emoji through emoji embedding models learned over emoji meanings in EmojiNet. Using emoji descriptions, emoji sense labels, and emoji sense definitions, and with different training corpora obtained from Twitter and Google News, multiple embedding models are learned to measure emoji similarity. Using a benchmark sentiment analysis dataset, it further shows that incorporating emoji meanings in EmojiNet into embedding models can improve the accuracy of sentiment analysis tasks by ~9%. To address the problem of emoji sense disambiguation, it uses word embedding models learned over Twitter and Google News corpora and shows that word embeddings models can be used to improve the accuracy of emoji sense disambiguation tasks. The EmojiNet framework, its RESTful web services, and other benchmarking datasets created as part of this dissertation are publicly released at http://emojinet.knoesis.org/

    A Keyword Sense Disambiguation Based Approach for Noise Filtering in Twitter

    Get PDF
    In this paper, we describe an approach to filter out noisy data generated by keywords-based tweet filtering methods by performing Word Sense Disambiguation on those keywords used to collect tweets. We present the noise filtering problem as a binary classification problem and discuss our evaluation strategy which is to be carried out in future

    A Framework to Understand Emoji Meaning: Similarity and Sense Disambiguation of Emoji using EmojiNet

    Get PDF
    Pictographs, commonly referred to as `emoji’, have become a popular way to enhance electronic communications. They are an important component of the language used in social media. With their introduction in the late 1990’s, emoji have been widely used to enhance the sentiment, emotion, and sarcasm expressed in social media messages. They are equally popular across many social media sites including Facebook, Instagram, and Twitter. In 2015, Instagram reported that nearly half of the photo comments posted on Instagram contain emoji, and in the same year, Twitter reported that the `face with tears of joy’ emoji has been tweeted 6.6 billion times. As of 2017, Facebook and Facebook Messenger processed over 60 million and 6 billion messages with emoji per day, respectively. Emogi, an Internet marketing firm, reports that over 92% of all online users have used emoji at least once. Creators of the SwiftKey Keyboard for mobile devices report that they process 6 billion messages per day that contain emoji. Moreover, business organizations have adopted and now accept the use of emoji in professional communication. For example, Appboy, an Internet marketing company, reports that there has been a 777% year-over-year increase and 20% month-over-month increase in emoji usage for marketing campaigns by business organizations in 2016. These statistics leave little doubt that emoji are a significant and important aspect of electronic communication across the world. The ability to automatically process and interpret text fused with emoji will be essential as society embraces emoji as a standard form of online communication. In the same way that natural language is processed with sophisticated machine learning techniques and technologies for many important applications, including text similarity and word sense disambiguation, emoji should also be amenable to such analysis. Yet the pictorial nature of emoji, the fact that the same emoji may be used in different contexts to express different meanings, and that emoji are used in different cultures over the world which can interpret emoji differently, make it especially difficult to apply traditional Natural Language Processing (NLP) techniques to analyze them. Indeed, emoji were developed organically with no overt/explicit semantics assigned to them. This contributed to their flexible usage but also lead to ambiguity. Thus, similar to words, emoji can take on different meanings depending on context and part-of-speech (POS). Polysemy in emoji complicates determination of emoji similarity and emoji sense disambiguation. However, having access to machine-readable sense repositories that are specifically designed to capture emoji meaning can play a vital role in representing, contextually disambiguating, and converting pictorial forms of emoji into text, thereby leveraging and generalizing NLP techniques for processing richer medium of communication. This dissertation presents the creation of EmojiNet, the largest machine-readable emoji sense inventory that links Unicode emoji representations to their English meanings extracted from the Web. EmojiNet consists of (i) 12,904 sense labels over 2,389 emoji, which were extracted from reliable online web sources and linked to machine-readable sense definitions seen in BabelNet; (ii) context words associated with each emoji sense, which are inferred through word embedding models trained over Google News and Twitter message corpora for each emoji sense definition; and (iii) recognizing discrepancies in the presentation of emoji on different platforms and specification of the most likely platform-based emoji sense for a selected set of emoji. It then discusses the application of emoji meanings extracted from EmojiNet to solve novel downstream applications including emoji similarity and emoji sense disambiguation. To address the problem of emoji similarity, first, it presents a comprehensive analysis of the semantic similarity of emoji through emoji embedding models learned over emoji meanings in EmojiNet. Using emoji descriptions, emoji sense labels, and emoji sense definitions, and with different training corpora obtained from Twitter and Google News, multiple embedding models are learned to measure emoji similarity. Using a benchmark sentiment analysis dataset, it further shows that incorporating emoji meanings in EmojiNet into embedding models can improve the accuracy of sentiment analysis tasks by ~9%. To address the problem of emoji sense disambiguation, it uses word embedding models learned over Twitter and Google News corpora and shows that word embeddings models can be used to improve the accuracy of emoji sense disambiguation tasks. The EmojiNet framework, its RESTful web services, and other benchmarking datasets created as part of this dissertation are publicly released at http://emojinet.knoesis.org/

    Analyzing the Social Media Footprint of Street Gangs

    Get PDF
    Gangs utilize social media as a way to maintain threatening virtual presences, to communicate about their activities, and to intimidate others. Such usage has gained the attention of many justice service agencies that wish to create better crime prevention and judicial services. However, these agencies use analysis methods that are labor intensive and only lead to basic, qualitative data interpretations. This paper presents the architecture of a modern platform to discover the structure, function, and operation of gangs through the lens of social media. Preliminary analysis of social media posts shared in the greater Chicago, IL region demonstrate the platform’s capability to understand gang members’ social media usage patterns

    EmojiNet: An Open Service and API for Emoji Sense Discovery

    No full text
    Emoji have grown to become one of the most important forms of communication on the web. With its widespread use, measuring the similarity of emoji has become an important problem for contemporary text processing since it lies at the heart of sentiment analysis, search, and interface design tasks. This paper presents a comprehensive analysis of the semantic similarity of emoji through embedding models that are learned over machine-readable emoji meanings in the EmojiNet knowledge base. Using emoji descriptions, emoji sense labels and emoji sense definitions, and with different training corpora obtained from Twitter and Google News, we develop and test multiple embedding models to measure emoji similarity. To evaluate our work, we create a new dataset called EmoSim508, which assigns human-annotated semantic similarity scores to a set of 508 carefully selected emoji pairs. After validation with EmoSim508, we present a real-world use-case of our emoji embedding models using a sentiment analysis task and show that our models outperform the previous best-performing emoji embedding model on this task. The EmoSim508 dataset and our emoji embedding models are publicly released with this paper and can be downloaded from http://emojinet.knoesis.org/
    corecore